The Summarization of Hierarchical Data with Exceptions
نویسنده
چکیده
In many applications of OLAP or data warehouse, users need to query data of interest, such as a set of data that satisfies specific properties. A normal answer to such query just enumerates all the interesting cells. This is the most accurate but not the most informative method. Summarizations need to be done in order to return more concise descriptions of these interesting cells to the users. MDL approach has been applied on the hierarchical data to get concise descriptions. However in many cases the descriptions are not concise enough to the users. Another method, GMDL, can generate much shorter descriptions, but the GMDL descriptions are not truly pure. The motivation of our research is to overcome the disadvantages in the above methods. In this thesis, we bring up a methodology that focuses on generating the summarization with exceptions of the hierarchical data. We extend the MDL approach to include some exceptions in the description. The exceptios are some uninteresting cells. The result shows that the description with exceptions is pure, which means that the description only covers “interesting cells”. We call this new approach MDLE, i.e. MDL with exceptions. Our new approach aims to find the shortest description with exceptions to cover all “interesting cells”. Firstly, we study two simple cases that can be solved in polynomial time and we give the algorithms. Secondly, we prove that MDL with exceptions is an NP-Hard problem in general cases and we propose three heuristics. Finally, we show some experiments that we have done to compare MDLE with MDL and GMDL. The experiment results show that MDLE generates more concise descriptions than MDL and meantime MDLE gets shorter descriptions than GMDL when the white-ratio is low or there are some red cells.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملGraph Hybrid Summarization
One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...
متن کاملAn Integrated Multi-document Summarization Approach based on Word Hierarchical Representation
This paper introduces a novel hierarchical summarization approach for automatic multidocument summarization. By creating a hierarchical representation of the words in the input document set, the proposed approach is able to incorporate various objectives of multidocument summarization through an integrated framework. The evaluation is conducted on the DUC 2007 data set.
متن کاملImpact of Document Structure on Hierarchical Summarization
Hierarchical summarization technique summarizes a large document based on the hierarchical structure and salient features of the document. Previous study has shown that hierarchical summarization is a promising technique which can effectively extract the most important information from the source document. Hierarchical summarization has been extended to summarization of multiple documents. Thre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004